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The  advances  supported  by  this  award  may  be  roughly  divided  into  three  categories: 

1.  the  development  of  faster  algorithms  for  the  standard  formulation  of  the  problem  of  interpo¬ 
lation  on  networks, 

2.  the  development  of  a  new  approach  to  interpolation  on  networks,  and 

3.  the  development  of  faster  algorithms  for  sparsifying  and  detecting  critical  edges  in  networks. 

This  report  begins  with  a  brief  explanation  of  the  problem  of  interpolation  on  networks,  followed 
by  an  explanation  of  these  advances. 

1  Interpolation  on  Networks 

In  network  interpolation  problems,  one  is  given  a  network  along  with  information  about  some  of  the 
nodes  in  the  network.  Assuming  that  nodes  that  are  connected  by  edges  are  similar,  one  is  asked 
to  guess  the  corresponding  information  for  the  remaining  nodes.  A  benchmark  example  of  such  a 
problem  is  that  of  detecting  spam  webpages  [CDB+06] .  This  problem  arose  in  an  effort  at  Microsoft 
to  detect  spam  webpages  that  are  created  solely  to  increase  the  ranking  of  the  pages  they  point  to. 
The  network  in  this  problem  has  one  vertex  for  every  webpage,  and  edges  representing  the  links 
between  the  webpages.  Researchers  at  Microsoft  investigated  many  webpages,  and  determined  for 
each  whether  it  was  spam  or  legitimate.  The  interpolation  problem  is  to  use  these  determinations 
along  with  the  link  structure  to  estimate  the  likelihood  that  other  webpages  are  spam.  This  is 
possible  because  legitimate  webpages  are  unlikely  to  link  to  spam  web  pages,  so  a  link  to  a  spam 
web  page  is  evidence  of  spam.  Similarly,  a  link  from  a  legitimate  webpage  is  evidence  of  legitimacy. 

Network  interpolation  problems  arise  in  many  other  contexts  in  Machine  Learning.  One  of 
the  most  famous  examples  comes  from  the  work  of  Zhu,  Ghahramani  and  Lafferty  [ZGL+03]  who 
showed  how  to  convert  many  problems  of  classification  and  regression  in  Machine  Learning  into 
problems  of  interpolation  on  networks. 

2  Faster  Algorithms  for  the  Standard  Interpolation 

The  standard  method  of  performing  interpolation  in  networks,  which  we  henceforth  call  hz  min¬ 
imization,  comes  from  Zhu,  Ghahramani  and  Lafferty  [ZGL+03],  and  only  applies  to  undirected 
networks.  Formally,  one  is  given  a  network  with  vertex  set  V  and  edge  set  E,  along  with  a  subset 
S  C  k  at  which  the  values  of  the  function  /  to  be  interpolated  are  known.  The  interpolation  is 
performed  by  finding  the  function  g  that  agrees  with  /  on  the  vertices  in  S  and  minimizes  the  sum 
of  the  squares  of  the  differences  across  edges: 

^2  wu,v  (. g(u )  -  g(v ))2 ,  (1) 

( u,v)£E 
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where  wu>v  is  the  weight  of  edge  (u,v). 

The  problem  of  finding  this  function  g  may  be  reduced  to  the  problem  of  solving  a  system  of 
linear  equations  in  the  Laplacian  matrix  of  the  network.  In  fact,  this  problem  is  mathematically 
identical  to  many  problems  that  arise  in  Computational  Science,  including  the  computation  of 
electrical  flow  and  heat  flow. 

This  award  supported  the  development  of  three  new  algorithms  for  solving  this  problem.  The 
first  of  them  appears  in  the  papers  [CKP+14,  CKM+14],  The  randomized  algorithm  in  this  paper 
solves  these  problems  to  accuracy  e  in  networks  with  m  edges  in  expected  time  O  (my/log  m  log  1/e). 
This  is  absurdly  fast:  for  moderate  e  it  is  less  time  than  would  be  required  to  sort  the  weights  of 
the  edges  in  the  network. 

Given  the  speed  of  this  algorithm,  one  may  wonder  why  we  would  need  another.  The  answer 
is  that  this  algorithm  is  not  well-suited  to  parallelization.  This  means  that  we  do  not  know  how 
to  accelerate  it  substantially  using  multi-core  processors,  and  that  it  is  ill-suited  for  problems 
whose  magnitude  requires  computation  by  clusters.  For  this  reason  we  began  the  development  of 
algorithms  that  have  efficient  parallel  implementations. 

Previously,  all  algorithms  that  solved  Laplacian  linear  systems  in  nearly  linear  time  employed 
two  graph  theoretic  primitives:  low  stretch  spanning  trees  and  graph  sparsihers.  While  we  had  no 
idea  how  to  compute  low  stretch  spanning  trees  efficiently  in  parallel,  we  thought  that  it  might 
be  possible  to  compute  graph  sparsihers  this  way.  This  motivated  us  to  design  an  algorithm  for 
solving  Laplacian  linear  equations  that  only  relies  of  graph  sparsibcation.  The  resulting  algorithm 
appeared  in  the  paper  [PS14]. 

This  paper  presented  the  first  algorithm  for  solving  Laplacian  linear  systems  in  polylogarithmic. 
parallel  time  and  nearly-linear  work.  That  is,  the  algorithm  requires  little  computation  and  is 
efficient  in  parallel.  However,  the  algorithm  in  this  paper  is  better  viewed  as  a  proof  of  concept 
than  something  that  one  should  really  implement:  it  requires  time  cubic  in  the  logarithm  of  the 
condition  number  of  the  system  to  be  solved.  While  this  is  asymptotically  fast  in  theory,  it  is  too 
slow  for  practical  use.  The  advantage  of  the  algorithm  presented  in  this  paper  are  that: 

1.  it  introduced  an  entirely  new  approach  to  the  fast  solution  of  Laplacian  linear  systems, 

2.  this  approach  is  very  easy  to  understand, 

3.  it  introduced  the  first  efficiently  parallelizable  algorithms  for  graph  sparsibcation,  and 

4.  it  inspired  the  development  of  even  better  parallel  algorithms  for  graph  sparsibcation  [Koul4, 
CLM+14], 

The  simplicity  of  the  algorithm  presented  [PS14]  has  enabled  us  to  improve  it  to  obtain  re¬ 
markably  fast  and  efficient  parallel  algorithms  in  [LPS].  In  this  most  recent  work,  we  develop 
algorithms  for  solving  Laplacian  linear  systems  that  run  very  quickly  in  parallel.  They  develop 
something  that  the  numerical  linear  algebra  community  has  been  seeking  for  a  long  time:  sparse 
approximate  inverses.  To  explain  these,  I  recall  that  the  classical  Gaussian  Elimination  algorithm 
for  solving  linear  equations  in  a  matrix  A  constructs  triangular  matrices  L  and  U  so  that  LU  =  A. 
One  can  then  solve  a  system  of  linear  equations  in  A  by  solving  equations  in  L  and  JJ .  This  can 
be  done  quickly  if  L  and  U  are  sparse,  as  linear  equations  in  triangular  matrices  can  be  solved 
with  a  number  of  computations  proportional  to  their  number  of  nonzero  entries.  We  prove  that 
every  Laplacian  matrix  A  has  an  approximate  LU-factorization  in  which  both  L  and  U  have  0(n) 
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entries,  where  n  is  the  dimension  of  A.  This  provides  the  first  linear  time  sequential  algorithm  for 
approximately  solving  systems  of  equations  in  Laplacian  matrices.  Moreover,  the  matrices  L  and 
U  we  produce  are  very  special:  systems  of  equations  in  these  matrices  can  be  solved  in  parallel 
time  O(lognloglogn)  and  linear  work.  This  is  within  an  O(loglogn)  factor  of  the  best  possible. 

As  is  the  case  with  Gaussian  Elimination,  it  takes  us  longer  to  compute  these  matrices  L  and  U 
than  it  does  to  use  them  to  solve  systems  of  linear  equations.  However,  if  we  are  willing  to  settle  for 
slightly  worse  L  and  U,  we  can  accelerate  the  computation.  Our  presently  best  algorithm  computes 
the  matrices  L  and  U  and  applies  them  to  solve  a  linear  system  in  parallel  time  0(log6n).  While 
this  is  still  too  slow  to  be  practical,  we  are  optimistic  that  it  will  eventually  be  possible  to  improve 
it  to  make  it  practical. 

3  Sparsification  and  Significant  Edge  Detection 

The  fast  parallel  algorithms  that  we  describe  for  solving  systems  of  equations  in  Laplacian  matrices 
requires  fast  parallel  algorithms  for  network  sparsification.  Sparsification  is  the  process  of  finding 
a  sparse  network  that  approximates  a  given  network.  We  use  the  notion  of  spectral  sparsification. 
Thus,  the  sparse  networks  we  produce  have  approximately  the  same  community  structures  as  the 
original,  and  the  solutions  to  interpolation  problems  in  the  sparse  approximations  are  similar  to 
those  in  the  original  (at  least  for  the  I2  minimization  described  above). 

Sparse  approximations  of  networks  are  often  diserable  because  they  take  less  space  to  store. 
One  may  wonder  why  this  is  useful,  as  the  networks  we  encounter  are  often  sparse.  The  answer  is 
that  sparsification  allows  us  to  compactly  store  information  about  a  network  that  can  take  a  long 
time  to  compute.  For  example,  we  might  want  to  keep  track  of  all  pairs  of  vertices  that  are  within 
distance  2,  3,  or  even  4  of  each  other.  In  [PS14],  we  quickly  compute  sparsifiers  that  enable  us  to 
approximate  all  of  these  distance-^  networks. 

Our  sparsification  algorithms  have  two  steps:  we  first  assign  a  significance  to  every  edge  in  a 
network,  where  we  measure  the  significance  of  an  edge  by  how  useful  it  is  to  communication  in  a 
network.  The  most  significant  edges  are  those  whose  removal  would  disconnect  the  network.  We 
then  construct  a  sparse  approximation  of  the  network  by  randomly  sampling  the  edges  according  to 
their  significance.  Thus,  the  first  step  of  our  algorithm  is  a  fast  parallel  procedure  for  approximating 
the  significance  of  every  edge. 

4  A  Better  Approach  to  Interpolation 

The  biggest  advance  supported  by  this  award  is  the  development  of  a  new  approach  to  performing 
interpolation  in  networks  that  we  call  “Lipschitz  Learning”  that  overcomes  three  disadvantages  of 
I2  minimization: 

1.  I2  minimization  can  only  be  applied  in  undirected  networks, 

2.  empirically,  I2  minimization  has  been  shown  to  have  poor  performance  in  large  networks  when 
the  set  of  values  at  which  the  function  is  known  is  small,  and 

3.  in  I2  minimization  there  is  no  easy  way  to  compensate  for  errors  in  the  edges  of  the  network. 
Lipschitz  learning  overcomes  all  three  of  these  problems. 
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In  our  paper  on  Lipschitz  learning  [KRSS15a],  we  both  define  this  new  approach  to  interpolation 
on  networks  and  develop  reasonably  fast  algorithms.  While  these  algorithms  are  not  nearly  as  fast 
as  those  that  we  have  developed  for  1 2  minimization,  they  are  a  good  start:  we  can  perform  the 
interpolation  on  networks  with  millions  of  nodes  in  a  few  minutes.  We  have  made  an  implementation 
of  our  algorithms  available  on  GitHub  [KRSS15b].  Our  algorithms  for  dealing  with  errors  in  the 
input  network  and  function  values  actually  use  the  Laplacian  linear  system  solvers. 

4.1  Lipschitz  Learning 

There  are  three  distinct  ways  of  defining  our  approach  to  Lipschitz  learning.  The  easiest  way  to 
think  of  it  is  that  instead  of  minimizing  (1),  it  begins  by  finding  the  function  g  that  agrees  with  / 
on  S  that  minimizes 

max  wf U,v)\g(u)  ~  g(v)\.  (2) 

As  this  does  not  lead  to  a  unique  function  g ,  we  seek  the  function  that  minimizes  the  second-to- 
maximum  of  these  quantities  among  those  that  minimize  the  maximum,  and  so  on.  The  result  is 
sometimes  called  the  Absolutely  Minimal  Lipschitz  Extension.  We  call  it  the  lex  minimizer. 

Another  way  of  defining  it  is  to  consider  the  problem  of  minimizing  Laplacian  p-norms  intro¬ 
duced  in  [BZ13]: 

Y  iwu,v\g(u)  -  g(v)\)p ,  (3) 

(u,v)GE 

over  all  functions  that  agree  with  /  on  g.  The  lex  minimizer  is  the  limit  as  p  grows  large  of  the 
function  g  that  agrees  with  /  on  S  that  minimizes  (3).  However,  we  can  compute  the  lex  minimizer 
much  faster  than  one  can  solve  (3). 

The  third  way  of  defining  it  is  by  analogy  to  one  of  the  characterizations  of  the  solution  to  (1). 
The  minimizer  of  (1)  for  an  unweighted  network  is  the  function  g  that  agrees  with  /  on  S  such 
that  for  every  vertex  not  in  S,  the  value  of  g  at  that  vertex  is  the  average  of  the  value  of  g  at  its 
neighbors.  In  unweighted  networks,  the  lex  minimizer  is  the  function  g  such  that  at  every  vertex 
not  in  S,  the  value  of  g  is  the  average  of  the  minimum  and  maximum  values  at  its  neighbors. 

All  these  definitions  can  be  naturally  extended  to  directed  networks. 

4.2  Algorithms  and  Results 

We  observe  experimentally  that  lex  minimizers  give  much  better  predictions  than  I2  minimization 
when  the  label  set  S  is  small.  The  most  interesting  example  is  the  webspam  data  set,  for  which  we 
obtain  much  better  results  than  the  previous  algorithms  [ZBT07]. 

One  of  the  big  advantages  of  lex  minimizers  over  the  standard  approach  of  minimizing  (1)  is 
that  they  allow  us  to  easily  compensate  for  noise  both  in  the  values  of  /  on  S  and  in  the  actual 
edges  of  the  network.  We  develop  a  fast  algorithm  to  minimize  (2)  subject  to  a  budget  of  changes 
to  edge  weights  and  values  of  /  on  S.  We  do  this  by  formulating  the  resulting  problem  as  a  linear 
program,  and  by  then  designing  a  custom  interior  point  method  for  solving  the  linear  program.  We 
prove  that  the  interior  point  method  requires  few  iterations,  and  that  the  dominant  cost  of  each 
iteration  is  the  solution  of  a  system  of  linear  equations  in  a  Laplacian  matrix. 

We  also  show  how  to  perform  a  rather  surprising  outlier  removal  in  polynomial  time:  we  can 
minimize  (2)  subject  to  the  removal  of  a  given  number  of  vertices  from  S.  That  is,  we  can  com¬ 
pensate  for  the  possibility  that  some  of  the  values  are  extremely  wrong.  While  we  do  not  yet  know 
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a  fast  polynomial  time  algorithm  for  this  task,  we  are  optimistic  that  we  may  one  day  find  one.  It 
is  particularly  surprising  that  such  an  algorithm  exists,  as  we  show  that  the  analogous  problem  for 
(1)  is  NP-complete. 

5  Isotonic  Regression 

We  have  built  on  the  techniques  we  developed  in  our  work  on  Lipschitz  learning  to  design  the 
fastest  algorithms  for  isotonic  regression  [KRS],  a  problem  that  has  been  studied  since  the  1950’s. 
Isotonic  regression  problems  are  another  type  of  inference  problem  on  networks.  They  are  specified 
by  a  directed  acyclic  network  in  which  every  node  is  associated  with  a  real  variable.  The  directed 
edges  specify  inequalities  which  it  is  known  the  nodes  must  satisfy:  an  edge  from  node  u  to  node  v 
indicates  that  the  variable  at  node  v  must  exceed  the  variable  at  node  u.  In  addition,  an  estimate 
of  every  variable  is  provided.  The  isotonic  regression  problem  is  to  compute  values  of  the  variables 
that  come  as  close  as  possible  to  the  given  estimates  subject  to  the  inequalities  dictated  by  the 
network. 

Different  measures  of  “close”  provide  very  different  computational  problems.  For  measures  of 
close  in  infinity  norm  or  lexicographic  infinity  norm,  we  reduce  the  problem  to  that  of  computing 
lex  minimizers.  For  lp  norms,  we  solve  the  problem  by  specially  designed  interior  point  methods. 
In  fact,  these  are  the  first  fast  algorithms  for  the  problem  for  p  other  than  1  and  2. 
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